The Transformation Distance : A Dissimilarity Measure BasedOn Movements Of

نویسندگان

  • Jean-Paul Delahaye
  • Eric Rivals
چکیده

Evolution acts in several ways on DNA : either by mutating a base, or inserting, deleting or copying a segment of the sequence 17, 18, ?]. Classical alignment methods deal with point mutations 19], genome-level mutations are studied using genome rearrangement distances 1, 2, 8, 9]. Those distances are mostly evaluated by a number of transpositions of genes. Here we deene a new distance, called transformation distance, which quantiies the dissimilarity between two sequences in term of segment-based events (without requiring a preliminary identiication of genes). Those events are weighted by their description length. The transformation distance from S to T is the Minimum Description Length among all possible scripts that build the sequence T knowing the sequence S with segment-based operations. The underlying idea is related to Kolmogorov complexity theory. Herein, we focus on the case where segment-copy,-reverse-copy and-insertion operations are allowed. We present an algorithm which computes the transformation distance. A biological application on Tnt1 tobacco retrotransposon is presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Improvement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra

Hyperspectral image containing high spectral information has a large number of narrow spectral bands over a continuous spectral range. This allows the identification and recognition of materials and objects based on the comparison of the spectral reflectance of each of them in different wavelengths. Hence, hyperspectral image in the generation of land cover maps can be very efficient. In the hy...

متن کامل

A new measure for distance-field based shape matching

One popular method to match two shapes is to register their distance fields as images. We discuss the well-known robustness problems of this approach and identify the noncommutativity of distance transform and geometric transformations as a core issue. Building on this, we propose a simple modification of the method, deriving a new dissimilarity measure. As it involves multiple distance field c...

متن کامل

A New Dissimilarity Measure Between Trees by Decomposition of Unit-Cost Edit Distance

Tree edit distance is a conventional dissimilarity measure between labeled trees. However, tree edit distance including unit-cost edit distance contains the similarity of label and that of tree structure simultaneously. Therefore, even if the label similarity between two trees that share many nodes with the same label is high, the high label similarity is hard to be recognized from their tree e...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998